首页> 外文OA文献 >Learning Partially Observable Deterministic Action Models
【2h】

Learning Partially Observable Deterministic Action Models

机译:学习部分可观察的确定性行动模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We present exact algorithms for identifying deterministic-actions effects andpreconditions in dynamic partially observable domains. They apply when one doesnot know the action model(the way actions affect the world) of a domain andmust learn it from partial observations over time. Such scenarios are common inreal world applications. They are challenging for AI tasks because traditionaldomain structures that underly tractability (e.g., conditional independence)fail there (e.g., world features become correlated). Our work departs fromtraditional assumptions about partial observations and action models. Inparticular, it focuses on problems in which actions are deterministic of simplelogical structure and observation models have all features observed with somefrequency. We yield tractable algorithms for the modified problem for suchdomains. Our algorithms take sequences of partial observations over time as input, andoutput deterministic action models that could have lead to those observations.The algorithms output all or one of those models (depending on our choice), andare exact in that no model is misclassified given the observations. Ouralgorithms take polynomial time in the number of time steps and state featuresfor some traditional action classes examined in the AI-planning literature,e.g., STRIPS actions. In contrast, traditional approaches for HMMs andReinforcement Learning are inexact and exponentially intractable for suchdomains. Our experiments verify the theoretical tractability guarantees, andshow that we identify action models exactly. Several applications in planning,autonomous exploration, and adventure-game playing already use these results.They are also promising for probabilistic settings, partially observablereinforcement learning, and diagnosis.
机译:我们提出了用于确定动态部分可观察域中确定性作用效果和前提条件的精确算法。当一个人不知道某个领域的行为模型(行为对世界的影响方式),并且必须随着时间的推移从局部观察中学习时,它们才适用。这样的场景是现实世界中常见的应用程序。它们对AI任务具有挑战性,因为在那里难以处理的传统领域结构(例如条件独立性)在这里失效(例如世界特征变得相互关联)。我们的工作脱离了关于部分观察和行为模型的传统假设。特别是,它关注的问题是动作是简单结构的确定性,并且观察模型具有以一定频率观察到的所有特征。对于此类域的修正问题,我们得出了易于处理的算法。我们的算法将一段时间内的部分观测序列作为输入,并输出可能导致这些观测的确定性动作模型。算法输出所有或其中一个模型(取决于我们的选择),并且精确的是,在给定观察。对于AI计划文献中研究的某些传统动作类(例如STRIPS动作),我们的算法在时间步数和状态特征上采用多项式时间。相比之下,传统的HMM和强化学习方法对于这样的领域是不精确的,而且指数级难处理。我们的实验验证了理论上的易操作性保证,并表明我们准确地确定了行为模型。在计划,自主探索和冒险游戏中的一些应用已经使用了这些结果。它们还有望用于概率设置,部分可观察到的强化学习和诊断。

著录项

  • 作者

    Amir, Eyal; Chang, Allen;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号